Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition
نویسندگان
چکیده
This paper presents an approach of automatic selection of phonetically distributed sentence sets for speaker adaptation, and applies the concept to the task of Mandarin speech recognition with very large vocabulary. This is a different approach to the adaptation data selection problem. A computer algorithm is developed to select minimum sets of phonetically distributed training sentences from a text corpus defining the desired task. These sentence sets not only include an almost minimum number of words and sentences that cover the desired acoustic units, but also have statistical distributions of these acoustic phonetic units very close to that in the given text corpus defining the desired task. In this way, more frequently used units can be better trained with higher accuracy, thus improving the overall performance, but the new user needs to produce only a small number of meaningful sentences to train the recognizer. Different sets of sentences selected using different phonetic criteria taking into consideration the statistics of the different acoustic units in the given corpus can then be integrated into a multi-stage adaptation procedure. With this procedure, the recognition performance can be improved incrementally stage by stage using the adaptation data produced with these sentence sets. This proposed approach is applied to an example task of Mandarin speech recognition with a very large vocabulary, both in isolated syllable and continuous speech modes and includes different subject domains in continuous speech recognition. Although the primary results obtained in this paper are for this example task, it is believed that many of the concepts and techniques developed here will also be very useful for other speaker adaptation problems and other languages. c © 1999 Academic Press ¶Author for correspondence. 0885–2308/99/010079 + 19 $30.00/0 c © 1999 Academic Press 80 J. L. Shen et al.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker adaptation in the Philips system for large vocabulary continuous speech recognition
The combination of Maximum Likelihood Linear Regression (MLLR) with Maximum a posteriori (MAP) adaptation has been investigated for both the enrollment of a new speaker as well as for the asymptotic recognition rate after several hours of dictation. We show that a least mean square approach to MLLR is quite e ective in conjunction with phonetically derived regression classes. Results are presen...
متن کاملPhonetically Distributed Continuous Speech Corpus for Thai Language
This paper proposes a work on phonetically balanced sentence (PB) and phonetically distributed sentence (PD) set, which are parts of the text prompt for speech recording in Large Vocabulary Continuous Speech Recognition (LVCSR) corpus for Thai language. Firstly, a protocol of Thai phonetic transcription and some essential rules of phonetic correction after grapheme-to-phoneme (G2P) process are ...
متن کاملRemes Speaker - Based Segmentation and Adaptation in Automatic Speech Recognition
With proper training, automatic speech recognition works quite well when tested in conditions similar to the training conditions, but with a new speaker or a new environment the system performance often degrades. Speaker-based adaptation alters the speech recognition system to better match a specific speaker and thus improves the speech recognition results. In order to use speaker adaptation, t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Speech & Language
دوره 13 شماره
صفحات -
تاریخ انتشار 1999